Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Glossary of Terms to Understanding Airbyte #6235

Merged
merged 3 commits into from
Sep 21, 2021
Merged

Conversation

avaidyanatha
Copy link
Contributor

Main Changes

  • There's a lot of terms floating around that we can define in one place.

@avaidyanatha avaidyanatha added the area/documentation Improvements or additions to documentation label Sep 17, 2021

**Extract**: Retrieve data from a [source](../integrations/sources), which can be an application, database, anything really.

**Transform**: Clean up the data. This is referred to as [normalization](./basic-normalization.md) in Airbyte and involves [deduplication](./connections/incremental-deduped-history.md), changing data types, formats, and more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put Transform 3rd in the list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


### Full Refresh Sync

A **Full Refresh Sync** will attempt to retrieve all data from the destination every time a sync is run. Then there are two choices, **Overwrite** and **Append**. **Overwrite** deletes the data in the destination before running the sync and **Append** doesn't.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A **Full Refresh Sync** will attempt to retrieve all data from the destination every time a sync is run. Then there are two choices, **Overwrite** and **Append**. **Overwrite** deletes the data in the destination before running the sync and **Append** doesn't.
A **Full Refresh Sync** will attempt to retrieve all data from the source every time a sync is run. Then there are two choices, **Overwrite** and **Append**. **Overwrite** deletes the data in the destination before running the sync and **Append** doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch, fixed!


### Incremental Sync

An **Incremental Sync** will only retrieve new data everytime the a sync occurs. The first sync will always attempt to retrieve all the data. If the [destination supports it](https://discuss.airbyte.io/t/what-destinations-support-the-incremental-deduped-sync-mode/89), you can have your data deduplicated. Simply put, this just means that if you sync an updated version of a record you've already synced, it will remove the old record.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An **Incremental Sync** will only retrieve new data everytime the a sync occurs. The first sync will always attempt to retrieve all the data. If the [destination supports it](https://discuss.airbyte.io/t/what-destinations-support-the-incremental-deduped-sync-mode/89), you can have your data deduplicated. Simply put, this just means that if you sync an updated version of a record you've already synced, it will remove the old record.
An **Incremental Sync** will only retrieve new data from the source every time the a sync occurs. The first sync will always attempt to retrieve all the data. If the [destination supports it](https://discuss.airbyte.io/t/what-destinations-support-the-incremental-deduped-sync-mode/89), you can have your data deduplicated. Simply put, this just means that if you sync an updated version of a record you've already synced, it will remove the old record.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!


### DAG

DAG stands for **Directed Acyclic Graph**. It's an overly fancy term originally coined by math graph theorists that just describes a tree-like process. For example, in the following diagram, you start at A and can choose B or C, which then proceed to D and E, respectively. This kind of structure is great for representing workflows and is what tools like [Airflow](https://airflow.apache.org/) use to orchestrate the execution of software based on different cases or states.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DAG stands for **Directed Acyclic Graph**. It's an overly fancy term originally coined by math graph theorists that just describes a tree-like process. For example, in the following diagram, you start at A and can choose B or C, which then proceed to D and E, respectively. This kind of structure is great for representing workflows and is what tools like [Airflow](https://airflow.apache.org/) use to orchestrate the execution of software based on different cases or states.
DAG stands for **Directed Acyclic Graph**. It's a term originally coined by math graph theorists that describes a tree-like process. For example, in the following diagram, you start at A and can choose B or C, which then proceed to D and E, respectively. This kind of structure is great for representing workflows and is what tools like [Airflow](https://airflow.apache.org/) use to orchestrate the execution of software based on different cases or states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should mention that there can not be a loop inside

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to make things simple but we shouldn't belittle definitions (overly fancy)!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it - I'll hold back my pot shots at terminology :)

This is only relevant for individuals who want to create a connector.
{% endhint %}

This refers to how you define the data that you can retrieve from a Source. For example, if you want to retrieve `Account` data from your [Salesforce Source](../integrations/sources/salesforce.md), it needs to be defined clearly so that it can be translated to the destination. Learn more [here](./beginners-guide-to-catalog.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could to introduce the concept of schema. that would make the defintiion clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I've updated it with a definition I think non-technical people can understand too.

docs/SUMMARY.md Outdated
@@ -193,6 +193,7 @@
* [Templates](contributing-to-airbyte/templates/README.md)
* [Connector Doc Template](contributing-to-airbyte/templates/integration-documentation-template.md)
* [Understanding Airbyte](understanding-airbyte/README.md)
* [Glossary of Terms](understanding-airbyte/glossary.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be at the bottom of the list since it's more of a reference rather than something that people read end-to-end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done.

@@ -0,0 +1,54 @@
# Glossary of Terms

### ETL/ELT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this in alphabetical order

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants